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We describe optimized parallel tempering simulations of the 46-residue B -fragment of protein A. Native-like 
configurations with a root-mean- square deviation of 3 A to the experimentally determined structure (Protein 
Data Bank identifier IBDD) are found. However, at biologically relevant temperatures such conformations 
appear with only ^ 10% frequency in our simulations. Possible short comings in our energy function are 
discussed. 



I. INTRODUCTION 

Rational drug design or the pathology of amyloid diseases 
are only two problems whose solutions require a detailed un- 
derstanding of the relation between chemical composition and 
structure (and function) of proteins. Exploring this relation- 
ship through numerical simulations is a computationally hard 
problem. Two major factors limit our ability to efficiently sim- 
ulate large proteins and study their folding transitions. First, 
statistically sampling the rough energy landscape of a protein 
can be extremely slow even at room temperature. Second, 
present energy functions are often insufficiently accurate in 
describing the interactions between the atoms within a pro- 
tein, and between protein and their surrounding solvent. It is 
often not clear whether the failure of a computer experiment 
to find the known structure of a protein results from poor sam- 
pling or lack of accuracy in the energy function. 

To overcome some of the limitations of statistical sampling 
in the simulation of small proteins, sophisticated simulation 
schemes such as parallel tempering 1 1, 2] or generalized en- 
semble methods 10, 0] are now widely employed numeri- 
cal methods |3, fiul] In a recent line of research feedback- 
optimized algorithms have been developed that aim at fur- 
ther improving the statistical sampling of these methods by 
systematically improving the simulated statistical ensemble 
e.g. th e exact placement of replicas in temperature space 
illSllll]- Recent simulations of the 36-residue villin head- 
piece sub-domain HP- 3 6 in Ref. JO demonstrated that opti- 
mizing the sampled temperature distribution leads to qualita- 
tively different results for the same force field, in this case a 
combination of the ECEPP/3 force field 1 12] with an implicit 
solvent Oil . Previous simulations of HP-36 in Ref. il4ll had 
indicated that the native structure is not the global free energy 
minimum at room temperature for this force field. However, 
with the optimized temperature distribution it was later found 
that the correct structure is sampled with 90% frequency at 
room temperature ifToll . Clearly, the earlier numerical simula- 
tions suffered from a sampling problem and resulted in mis- 
leading conclusions on the force field, while the energy func- 
tion in fact accurately described the interactions for HP-36 as 
shown in the latter study with improved sampling. While this 
result is promising, the observed sampling difficulties never- 
theless suggest that the energy landscape for HP-36 modeled 
by this force field may be more rough than the one experi- 



enced by the "real" protein. Given the complexity of the en- 
ergy landscape for this small protein, one would expect that 
with increasing size and complexity of the molecule the ac- 
curacy of the energy function will further decrease as the en- 
ergy landscape gains further complexity. This would render it 
even more difficult (or simply impossible) to pick the correct 
structure in the numerical simulation of larger proteins. As 
stable domains in proteins usually consists of 50-200 residues 
it is therefore important to test the accuracy of current energy 
functions for proteins of this size. As a first step in this direc- 
tion we have applied the feedback-optimized parallel temper- 
ing scheme to simulate the 46-residue fragment 10-55 of the 
B domain of protein A (Protein Data Bank identifier IBDD) 
which forms the three-helix bundle displayed in Fig. [T] as de- 
termined in experiments |15]. This is one of the few small 
proteins with experimentally well-characterized states. For 
this reason, it has raised some interest as a model to test fold- 
ing algorithms or energy functions III, [HI Hill, 113, III III- 
While our high-statistics simulations do find native-like con- 
figurations with a root-mean- square deviation of ^ 3 A to the 
experimentally determined structure, these configurations do 
not correspond to the lowest energy configuration and appear 
with only around 10% probability at biologically relevant tem- 
peratures. Our results indicate that this deviation from exper- 
iment is due to a bias in the ECEPP/3 energy function toward 
helical structures. Another contributing factor is the use of an 




FIG. 1: The experimentally determined structure of the 46-residue B- 
fragment of protein A as stored in the Protein Data Bank (identifier 
IBDD). 
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implicit solvent model in our simulations. These models re- 
duce dramatically the numerical costs of protein simulations 
but can lead to distorted free energy landscapes. This effect 
has been observed earlier [23, "2?] even for more sophisticated 
implicit solvents than used by us. 



II. METHODS 

Our simulations of the protein A fragment utilize the 
ECEPP/3 force field [12] as implemented in the 2005 version 
of the program package SMMP ||25i |2^ . Here, the interac- 
tions between the atoms within the protein are approximated 
by a sum £^ecepp/3 consisting of electrostatic energy Ec, a 
Lennard- Jones term Elj, a. hydrogen-bonding term E^i) and 
a torsion energy Etor- 

£^ECEPP/3 = Ec -\-ELJ-\-Ehb-\-Etor 
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where r^j is the distance between the atoms i and j, is the 
l-th torsion angle, and energies are measured in kcal/mol. The 
protein- solvent interactions are approximated by a solvent ac- 
cessible surface term 
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FIG. 2: Time series of visited temperatures for one of the replicas in 
the parallel temperature simulation. Time is measured in MC sweeps 
and starts with the begin of measurements. 
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The sum goes over the solvent accessible areas Ai of all 
atoms i weighted by solvation parameters cr^ as determined 
m Ref.m a common choice when the ECEPP/3 force field is 
utilized. 

The above defined energy function leads to an energy land- 
scape that is characterized by a multitude of minima separated 
by high barriers. In order to improve statistical sampling and 
speed up equilibration at low temperatures we utilize a par- 
allel tempering scheme IH 0] which was first used in protein 
science in Ref. 1^27']. In this scheme TV non-interacting copies, 
or "replicas", of the protein are simultaneously simulated at a 
range of temperatures {Ti, T2, . . . , Tat}. After a fixed num- 
ber of Monte Carlo sweeps (or a molecular dynamics run for 
a fixed time interval) a sequence of swap moves, the exchange 
of two replicas at neighboring temperatures, Ti and T^+i, is 
suggested and accepted with a probability 

p{Ei,Ti ^ ^,+i,T,+i) = min(l,exp(A/3A^)) , (3) 

where A/3 = l/Tj+i — is the difference between the 
inverse temperatures and /S.E = — Ei is the difference 



FIG. 3: The specific heat C{T) as function of temperature T. A 
peaked signal is found at the helix-coil transition around Tc = 515 K 
where the helicity uh (T) increases, see the inset. Below the transi- 
tion a shoulder emerges over a broad temperature regime in which the 
average end-to-end distance de-e of the protein decreases rapidly as 
shown in the inset, an indication that the helices form a secondary 
structure. 



in energy of the two replicas. The exchange of conforma- 
tions considerably improves equilibration for all replicas, es- 
pecially those at low temperatures which can have extremely 
long equilibration times for conventional canonical simula- 
tions (at a fixed temperature). 

The improved equilibration of the replica exchange scheme 
is due to the random walks that individual replicas can per- 
form in in temperature space allowing them to move to higher 
temperatures where equilibration is fast and then move back 
down to lower temperatures thereby escaping barriers in the 
energy landscape. Obviously, the number of round-trips rirt 
between the lowest and highest temperature, Ti and Tn, re- 
spectively, is a lower bound for the statistically independent 
visits at the lowest temperature, and therefore a good measure 
for the equilibration of the parallel tempering simulation. It 
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is therefore desirable to maximize the number of round trips 
by optimizing the position of temperature points in the inter- 
val [Ti, Tat] . This can indeed be achieved in a systematic way 
by feeding back the local diffusivity of the described random 
walk using the feedback algorithm described in Refs. 
Technically, replicas are "labeled" according to which of the 
two extremal temperatures, Ti or Tat, they have visited last. 
Using this label we can define the number of replicas nup(^) 
(^down(O) which at temperature Ti have come from Ti (Tat). 
Then the fraction of replicas moving in one direction 



/up(0 = 



(4) 



describes a stationary distribution of probability flow between 
temperatures Ti and Tn with boundary conditions /up(l) = 1 
and fup{N) = 0. The local diffusivity D{T) of the random 
walk which a single replica performs in temperature is then 
given by D{T) oc AT • {df/dT)'^, where AT is the size 
of the temperature interval around T. Feeding back this local 
diffusivity it was shown in Refs. ^, TO that for the optimal tem- 
perature distribution, i.e. the one that maximizes the number 
of round trips urt, the fraction decreases linearly 
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For our simulations of the protein A fragment such an opti- 
mized distribution can be found by applying the iterative pro- 
cedure described in Refs. 9, 10 and is given for the 24 replicas 
in our simulations by 1000, 768, 673, 624, 574, 552, 539, 
529, 521, 515, 509, 501, 490, 467, 441, 419, 401, 386, 374, 
362, 350, 336, 317, 250. Our data are taken from a simu- 
lation with 1,000,000 sweeps for each replica. Here, a sweep 
consists of a sequential series of attempts to update each of the 
276 dihedral angles (the true degrees of freedom in our model) 
once. After each sweep, we attempt an exchange move (swap) 
of configurations between neighboring temperatures which is 
accepted with probability ©. The walk of one replica along 
the ladder of temperatures is displayed in Fig. O Measure- 
ments are taken every ten sweeps and stored for further analy- 
sis. These include the energy E, the radius of gyration Tgy as 
a measure of the geometrical size, and the number of helical 
residues uh, i.e. residues where the pair of dihedral angles 
((/), iIj) takes values in the range (-70° ± 30°, -37° ± 30°). 
Finally we recorded the configurations with overall lowest en- 
ergy obtained in our simulation. 



III. RESULTS AND DISCUSSIONS 

The biologically active state of a protein is thought to be the 
global minimum of the free energy at room temperature. Heat- 
ing leads to unfolding that is reversible after cooling. Hence, 
the folding transition should be marked by a signal in the spe- 
cific heat 
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FIG. 4: Frequency of (a) the radius of gyration rgy and (b) the end-to- 
end distance de-e for various temperatures. Each data point samples 
entries of a bin with 1 A width. The histograms are normalized. 



with P = l/ksT {ks is the Boltzmann constant) and N the 
number of residues. For protein A we indeed find a pro- 
nounced peak in the specific heat, displayed in Fig. [3l at a 
temperature Tc = 515 K. This peak is related to the formation 
of a-helices as one can see from the inset where the average 
number uh of residues which are part of an a-helix is plotted 
versus temperature. Around the transition temperature Tc the 
helicity rapidly increases. The corresponding formation of hy- 
drogen bond between residues (i, z + 4) leads to a much lower 
energy of such configurations, and the resulting fluctuation in 
the average energy as function of temperature is measured by 
the specific heat. Below this helix-coil transition a shoulder is 
observed in the specific heat in a temperature range between 
330 — 430 K. Since the helicity varies only little in this tem- 
perature range, but the typical end-to-end distance de-e of the 
protein A configurations decreases rapidly - as shown in the 
inset of Fig. [3]- this temperature regime is marked by the for- 
mation of a secondary structure of the helical segments which 
we will discuss in detail below. As the temperature is further 
decreased below T ^ 330 K, the specific heat lowers again 
and the end-to-end distance approaches a constant value. 
A more detailed picture of the formation of a compact 
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FIG. 5: Typical extended low-energy structure with high helical con- 
tent (Type I). 



FIG. 6: One of the typical triangle shaped compact structures char- 
acterized by small end-to-end distance de-e (Type II). 



secondary structure emerges when looking at the frequency 
of configurations with typical end-to-end distances de-e and 
their respective radius of gyration Tgy , a measure for the com- 
pactness of a folded protein structure, which are both plotted 
versus temperature in Fig. (H At temperatures directly below 
the helix coil transition, e.g. our data point at T = 440 K, 
the histograms indicate that configurations still differ widely 
as seen from the broad distributions found in both histograms, 
but with a clear preference for extended structures with a ra- 
dius of gyration centered around r^^ = 15 A. In the shoulder 
region of the specific heat, e.g. at our data points T = 400 K 
and T = 387 K, a double peak appears in histogram of the 
radius of gyration indicating the competion between extended 
structures (rgy ^ 15 A) and compact ones (rgy ^ 11 A). Fur- 
ther decreasing the temperature we are finally left with com- 
pact configurations only (with rgy < 11 A). The histograms 
in the radius of gyration seem to indicate a compactification 
transition where distinct secondary structures are formed at 
around 387 — 400 K that separates compact helical structures 
from extended helical structures immediately below the helix- 
coil transition around Tc = 515 K. Indeed, we find that two 
different compact structures are primarily formed below this 
compactification transition as becomes evident from our mea- 
surements of frequencies of typical end-to-end distance for 
various temperatures. Above the compactification transition 
and below the helix-coil transition the histogram of typical 
end-to-end distances is first centered around de-e = 43 A 
for T = 440 K and an almost flat histogram is observed at 
T = 400 K for 4-e = 9 - 40 A. Below T = 387 K two 
additional peaks form around de-e — 9 A and de-e = 23 
A, while there is still a broad feature around de-e = 40 A. 
Further lowering the temperature, the two peaks de-e = 9 A 
and de-e = 23 A further proliferate and become the dominant 
feature in the histogram. Finally, towards the lowest tempera- 
ture T = 250 K which we sampled, only configurations with 
small end-to-end distance prevail with more than 90% of all 



sampled configurations having a typical end-to-end distance 

of de-e < 10 A. 

The above analysis of the histograms of typical radius of 
gyration and end-to-end distance indicate that below the helix- 
coil transition temperature of Tc = 515 K there exist three 
different types of configurations which all have high helix- 
content (uh ^ 32 — 40) but differ strongly in their respec- 
tive arrangement of the helical segments. Two of these struc- 
tures are compact, while one is extended and found only above 
the compactif cation transition around T ^ 387 K. A typical 
example for the extended helical structure - named "struc- 
ture I" - is shown in Fig. \5\ The illustrated configuration 
has a radius of gyration of r^^ = 13.5. A, an end-to-end 
distance of de-e = 34.0 A, and a solvent- accessible sur- 
face area of As as = 4440 A^. All three helical segments 
are formed, but the total helicity is with 39 helical residues 
higher than the one found for the PDB structure where only 
34 residues are part of an a-helix. Accordingly, its root- 
mean square deviation (rmsd) to the native structure (as de- 
posited in the Protein Data Bank under the identifier IBDD) 
is 11.5 A (calculated over backbone atoms only). The total 
energy after minimization is Etot = —643 kcal/mol, of which 
^soiv = — 186 kcal/mol result from the solvation energy and 
^ECEPP/3 = —457 kcal/mol from the intramolecular interac- 
tions. Fig. [6] displays the first of the two compact structures 
found below the compactification transition. Named "struc- 
ture 11" it is the configuration with lowest overall energy found 
in our extensive simulations. After minimization we have 
^tot = —692 kcal/mol, with the energy difference to the pre- 
vious structure resulting from the more favorable intramolec- 
ular interactions (£^ecepp/3 = —510 kcal/mol) while there is 
little difference in the solvation term, Egoiv = — 182 kcal/mol. 
The small end-to-end distance of de-e = 5.7 A reflects a tri- 
angle shape of this configuration which differs from the na- 
tive one by a rmsd of 7.5 A and is with 41 residues maximal 
helical. The configuration is with Tgy = 11.3 A and a a sol- 
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FIG. 7: Another typical compact structure, but with larger end-to-end 
distance de-e (Type III). This structure resembles the native structure 
as deposited in the PDB and shown in Fig. 1. 

vent accessible surface area As as = 3950 more compact 
than the extended structure (type I) but not as compact as the 
native one which has Tgy = 9.7 A, and a solvent- accessible 
surface area As as = 3333 A^. A closer resemblance to the 
native structure has the second type of compact configurations 
found in our simulations, named "structure III" and illustrated 
in Fig. [71 This structure differs from the other compact, tri- 
angle shaped structure (" structure 11") in that it is more com- 
pact but has a smaller helicity with only 37 residues which are 
part of an a-helix. For this configuration we have measured 
a radius of gyration of r^^y = 10.4 A, a sol vent- accessible 
surface area As as = 3780 A^, and an end-to-end distance 
de-e = 23.0 A. Strikingly, the rmsd to the native structure is 
only 3.3 A. However, the total energy Etot = —662 kcal/mol 
is by about 30 kcal/mol higher than the one of the other com- 
pact structure shown in Fig.[6l This is similar to Ref. 22 where 
the best sampled structure has a rmsd of 2.33 A to the native 
one, but the lowest energy configuration differs by 6.41 A. 
In our case, the difference for the two compact structures is 
primarily due to the ECEPP/3 energy which for the second 
compact structure is found to be £^ecepp/3 = —484 kcal/mol 
while both compact structure have similar solvation energy, 
Esoiv = —178 kcal/mol for the second compact structure. 
This may indicate shortcomings of our implicit solvent. This 
conjecture is supported by Ref. 19 where native-like configu- 
rations are observed as free energy global minimum in simu- 
lations with an explicit solvent. 

Note that the triangle- shaped compact configuration 
("structure 11") in Fig. [6] has a total energy that is ^ 
30 kcal/mol lower than the lowest-energy configuration found 
in previous simulations Ref. [28| with the same force field. 
These simulations reported a lowest-energy configuration 
where the middle helix is broken at residue Gly21, i.e. the 
configuration is built out of four helical segments. Similar 
configurations are also observed in our simulations and are 
found to have energies comparable to the ones found previ- 
ously in Ref. 28. Depending on the arrangement of the he- 
lices they either resemble the triangle shaped structure of type 
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FIG. 8: Frequency of the three dominating helical low-energy struc- 
tures as function of temperature. 



"11" or the other compact structure of of type "III", and are 
grouped together with those structures in our analysis of the 
distribution of varying structures as a function of temperature. 

Finally, we note that all three structures are built out of he- 
lical segments and therefore only observed with substantial 
frequency at and below the helix-coil transition at Tc. The 
extended structure (type I) is most frequent at temperatures 
between 441 — 467 K. Its probability decreases with tempera- 
ture and at T = 317 K only about 1 % of the configurations be- 
long to type I. Compact configurations (type II and III) apear 
with a frequency of more than 1 % only for temperatures be- 
low T = 490 K. The native-like configurations of type III are 
slightly more frequent than the triangle- shaped of type II for 
a small temperature window above T = 362 K where their 
frequency reaches with 23% a maximum, see Fig.[8j Further 
decreasing the temperature the frequency of configurations of 
type III diminishes while configurations of type II become 
more frequent. At T = 317 K only 12% of the configurations 
are native-like, but already 62% belong to type II. While the 
entropy of the triangle shaped structure (type II) is certainly 
lower than the entropy of the native structure (which might 
explain the suppression of type-II structures at higher temper- 
atures), the entropic gain of the native-like structure is com- 
pensated by an energetic gain of the maximally helical trian- 
gle structure at lower temperatures making the triangle shaped 
structure the predominant one. This observation clearly re- 
flects a bias in the ECEPP/3 energy function toward helical 
structures. Growth of helices is energetically favored which in 
turn strongly restricts the possible topologies the folded struc- 
tures can assume. This leads to the observed dominance of 
structures that while less compact and larger solvent accessi- 
ble surface are are more helical than the one observed in ex- 
periment. Note that such a thermodynamic bias has not been 
observed in Ref. 21 where also configurations similar to the 
ones in Fig. [5]l7] were observed during the simulation. This 
indicates that this coarse-grained model employed in Ref. Illl 
(and the one of Ref. 1 8) captures the effective interactions in 
protein A better than our all- atom energy function. Hence, we 
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should contemplate possible corrections to our energy func- 
tions which should decrease the helix-forming tendencies. We 
are currently exploring this conjecture with a variant of the 
ECEPP force field proposed by Abagyan and co-workers [i29i1 . 

IV. CONCLUSIONS 

We have performed parallel tempering simulations of the 
46-residue B -fragment of protein A with an optimized temper- 
ature distribution and high statistics. Our goal was to test the 
limitations set on protein simulations by our energy function, 



the ECEPP/3 force field with an implicit solvent. While we 
find native-like structures with ^ 3 A rmsd, these structures 
appear at room temperature with only ^10% probability. En- 
ergetically favored are compact structures that are maximal 
helical, and more exposed to the surrounding solvent while 
less compact than the native structure. This observation sug- 
gests the need for correction terms to the ECEPP/3 force field 
which decrease the helix-forming bias when using the force 
field in combination with an implicit solvent. 
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